clear
set more off
macro drop all
capture log close

/********************************************************************************
Discrimination in Multi-Phase Systems: Evidence from Child Protection
Build the Analysis Sample To Include Only Investigations Which Entered the 
Rotation System

Created on: 2/26/19

Last Modified on: 2/20/2024

Description: Start with master child*case level DHHS file and limit sample to only
	     investigations in which a caseworker is likely to have been randomly
	     assigned.   
	

Note that we have removed the file directory names from this program for 
confidentiality reasons.
********************************************************************************/

** Setting the Directory
global rawdata 
global cleandata 
global tmp 

/********************************************************************************

Start with the master child*case level file and limit the sample to only cases 
which would have entered the rotation system beginning in 2008 (when worker data
is available).

This includes:
(1) The FIRST EVER investigation for each child
(2) Any investigation which happened after AT LEAST 1 YEAR PASSED since previous
investigation (these are likely to enter the rotagtion system).
	
There are some important sample restrictions: 

-(a) drop all investigations which occurred AFTER a child was placed into foster care

-(b) drop children born before 8/1/96 when I don't have data about investigations or 
foster care to ensure that I'm not including children who were ever in foster care
before the data starts

-(c) drop children born after 12/31/2012 because during the May 2016 match, Daniel
Hubbard and Andrew Moore did not include any children born after this date. Therefore,
any child born after 12/31/2012 would necessarily not match to a ric.

*******************************************************************************/

**DEFINE HOW MANY DAYS NEED TO PASS TO CONSIDER A CASE AS ENTERING THE ROTATION SYSTEM
local days_since_last_inv=365

***********************
***(1) SAMPLE RESTRICTIONS
***********************

**Start with the master child*case level DHHS file 
use "${cleandata}master_dhhs.dta", clear

**Keep only relevant variables
keep vicid inv_caseid complaint_date complaint_year dob worker_id ///
	worker_county zipcode_vic fc fc_ever categ sexab removal* fc_enddt

**Merge on information about pre-2008 investigations (censor at 10+ because of 
**unreal outliers)
sort vicid
merge m:1 vicid using "${cleandata}complaint_date_pre2008.dta"
replace n_inv_pre2008=0 if _merge==1
replace n_inv_pre2008=10 if n_inv_pre2008>10
drop if _merge==2
drop _merge

**Create variable for number of investigations before the current one
sort vicid complaint_date
bysort vicid: gen n_inv_post2008=_n
gen n_prior_inv=n_inv_pre2008+n_inv_post2008-1
drop n_inv_pre2008 n_inv_post2008 
la var n_prior_inv "# Prior Investigations with CPS"

**Identify cases which were most likely to have been randomly assigned to caseworkers
**(eg. entered the rotation system)
gen enter_rotation=.
la var enter_rotation "Case was likely to have entered the rotation system"

**(a) FIRST EVER INVESTIGATIONS POST 2008
gegen complaint_date_min=min(complaint_date) if complaint_date_pre2008==., by(vicid)
replace enter_rotation=1 if complaint_date==complaint_date_min
drop complaint_date_min

**(b) MORE THAN 1 YEAR BETWEEN A PRE 2008 AND POST 2008 INVESTIGATION
sort vicid complaint_date
bysort vicid: gen n=_n
gen first_post2008=(n==1)
gen date_diff=complaint_date-complaint_date_pre2008 if first_post2008
replace enter_rotation=1 if date_diff>`days_since_last_inv' & date_diff!=.
drop first_post2008 date_diff

**(c) MORE THAN 1 YEAR APART BETWEEN POST 2008 INVESTIGATIONS 
sort vicid complaint_date
gen date_diff=complaint_date-complaint_date[_n-1] if vicid==vicid[_n-1]
replace enter_rotation=1 if date_diff>`days_since_last_inv' & date_diff!=. 
drop date_diff

**Limit to only cases which might have entered the rotation system
keep if enter_rotation==1
drop enter_rotation

*duplicates report vicid complaint_date
**Fewer than 1% of youth have 2 different cases opened on the same day. In these 
**instances, are the cases assigned to the same worker?  If so, the number of duplicates
**should be identical
*duplicates report vicid complaint_date worker_id
/*
It looks like for the most part, any cases which are started on the same day are
assigned to the same worker but there are a small number of investigations in which
they are started on the same day but assigned to different workers.  My guess is 
this is just an idiosyncratic error with having to sometimes randomly assign workers.

With this, I will keep any case which resulted in foster care. And then randomly 
choose which case to keep.
*/
bysort vicid complaint_date: gen flag=1 if fc[1]!=fc[_N] 
drop if flag==1 & fc==0

set seed 11223344
gen random=runiform()
bysort vicid complaint_date: egen rand_min=min(rand)
drop if random!=rand_min
drop n flag rand* complaint_date_pre2008

**RESTRICTION 1: DROP INVESTIGATIONS WHICH HAPPENED AFTER A CHILD WAS PLACED IN 
**FOSTER CARE

*****(a) Children who were placed in FC before 2008
sort vicid
merge m:1 vicid using "${cleandata}removal_date_pre2008.dta"
drop if _merge==3
keep if _merge==1
drop _merge

*****(b) Children who were placed in FC after 2008
sort vicid complaint_date
bysort vicid: egen fc_max=max(fc)
**assign a child as being placed in foster care if a subsequent investigation that 
**did not enter the rotation eventually led to foster care placement (eg. 2 invs 
**within 1 week, and the second led to foster care)
bysort vicid: replace fc=1 if _n==_N & fc_ever==1 & fc_max==0
drop fc_max

**Bring in removal dates from the living arrangements file for anyone missing them
tempfile master
save `master'
use "$cleandata/living_arrangement_clean.dta", clear
keep vicid fc_startdt 
duplicates drop
merge 1:m vicid using `master'
drop if _merge==1
replace removal_date=fc_startdt if fc==1 & removal_date==.
replace removal_date=complaint_date if fc==1 & removal_date==.
drop fc_startdt

**drop inv's after first one that led to foster care placement
gen complaint_fc=complaint_date if fc==1
bysort vicid: egen complaint_fc_min=min(complaint_fc)
drop if complaint_date>complaint_fc_min
drop complaint_fc*

**RESTRICTION 2: DROP CHILDREN WHO WERE BORN BEFORE AUGUST 1996
drop if dob<date("8-1-1996","MDY")

**RESTRICTION 3: DROP CHILDREN WHO WERE BORN AFTER 12/31/2012
drop if dob>date("12-31-2012","MDY")

**Save analysis sample of investigations which would have entered the rotation 
**system
sort vicid complaint_date
save "${cleandata}analysis_sample.dta", replace











































